Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training

نویسندگان

Yamato Ohtani

Koichiro Mori

Masahiro Morita

چکیده

This paper describes novel voice quality control of synthetic speech using cluster adaptive training (CAT). In this method, we model voice quality factors labeled with perceptual expressions such as “Gender,” “Age” and “Brightness.” In advance, we obtain the intensity scores of the perceptual expressions by conducting a listening test, which evaluates differences of voice qualities between synthetic speech of average voice and that of the target. Then we build perceptual expression (PE) clusters that we call PE models (PEM) under the conditions that the average voice model is used as the bias cluster and the PE intensity scores are employed as the CAT weights. In synthesis, we can generate controlled synthetic speech by the linear combination of PEMs and the existing speaker’s model. Subjective results demonstrate that the proposed method can control the voice qualities with PEs in many cases and the target synthetic speech modified by PEMs achieves comparatively good speech quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model

This paper proposes a deep neural network (DNN)-based statistical parametric speech synthesis system using an improved time-frequency trajectory excitation (ITFTE) model. The ITFTE model, which efficiently reduces the parametric redundancy of a TFTE model, improved the perceptual quality of the vocoding process and the estimation accuracy of the training process. However, there remain problems ...

متن کامل

Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis

Training a high quality acoustic model with a limited database and synthesizing a new speaker’s voice with a few utterances have been hot topics in deep neural network (DNN) based statistical parametric speech synthesis (SPSS). To solve these problems, we built a unified framework for speaker adaptive training as well as speaker adaptation on Bidirectional Long ShortTerm Memory with Recurrent N...

متن کامل

Speaker and language adaptive training for HMM-based polyglot speech synthesis

This paper proposes a novel technique for speaker and language adaptive training for HMM-based statistical parametric polyglot speech synthesis. Language-specific context-dependencies in the system are captured using CAT with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by CMLLR-based transforms. This framework allows multi-speaker/multi-la...

متن کامل

Output-Based Objective Measure for Non-Intrusive Speech Quality Evaluation

This paper describes a newly developed output-based method for non-intrusive evaluation of speech quality of voice communication systems, and evaluates its performance. The method, which uses only the output of the system, is based on measuring perceptually motivated objective auditory distances between the voiced parts of the speech signal whose quality to be evaluated to appropriately matchin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training

نویسندگان

چکیده

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model

Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis

Speaker and language adaptive training for HMM-based polyglot speech synthesis

Output-Based Objective Measure for Non-Intrusive Speech Quality Evaluation

عنوان ژورنال:

اشتراک گذاری